This assignment is for ETC5521 Assignment 1 by Team Dugong comprising of Abhishek Sinha, Yezi He, Yawen Zhang, and Cuiping Wei.

1 Introduction and motivation

The House Price Index(HPI) is a broad index for calculating the monthly change in the selling prices of single-family house prices, which provides house price trends at various state and national levels (Index 2018). Also, the housing market represents about 15% to 18% of U.S. GDP(NAHB 2019), which means a weak or strong housing market can substantially influence the direction of the overall economy.
In recent years, as the U.S. housing market booming, HPI drew increasing attention from housing economists. From 1975 to 2018, the HPI went through ups and downs, especially the Great Depression that started in 2007, which led to the cold winter for the real estate market. What can we find from HPI when it comes to the mortgage rates? What’s the story between HPI, mortgage rates, and the recession? It’s exciting to explore what interesting economic secrets can HPI, mortgage rates, and the recession tell us?
In this analysis, R is the only tool for data cleaning and analysis.
The rest of the analysis proceeds as follows. Section 2 presents the data description. Section 3 details the findings in data analysis. The limitations of the analysis are presented in Section 4. Finally, Section 5 provides the conclusions of this analysis.

1.1 Secondary research questions

This analysis aims to explore six secondary questions:

  • How similar is the change in state HPI with national HPI?
  • Are HPI more likely to be higher in prosperous or populous cities?
  • Was the annual change for the HPI index similar to the mortgage rates?
  • Do lower mortgage rates mean higher HPI? Can the long mortgage rates predict HPI?
  • Can HPI and mortgage rates provide evidence of a sub-prime crisis?
  • Can we find any house price bubble in the U.S from 1975 to 2018?

2 Data

This section mainly introduces the data, data sources and data description.

2.1 Data source

The primary data which are used for analyzing in this report are from GitHub tidytuesday(tidytuesday 2019), which contains house and mortgage data. There are four data sets used for the analysis. The HPI and mortgage rate can be found from the Freddie Mace House Price Index (FHFA 2020), and the small list of recession in the U.S. could be found in Wikipedia(Wikipedia 2020). Also, we added U.S. GDP data for comparative analysis in Section 3.2, which can be obtained from the U.S. Bureau of Economic Analysis(Data 2019).

2.2 Data description

2.2.1 HPI data

The first table is related to the House Price Index (HPI) in each state. It is covering index values are available for the national, the 50 states and the District of Columbia, and the more than 380 metropolitan statistical areas (MSAs) in the U.S. The following analysis will use the whole data set.

Variable Class Description
year date - integer Year
month date - integer Month
state character US State
price_index double Calculated House Price Index - average price changes in repeat sales or refinancings at state level
us_avg double Calculated House Price Index - averaged at national level

2.2.2 Mortgage data

The second data set is related to Freddie Mac’s Mortgage rates. Mortgage rates are an essential factor that can influence the home buyer’s decision which refers to homebuyers looking to finance a new home purchase with a mortgage loan. Freddie Mac maintains an extensive data set for mortgage rates consisting of different types of mortgage like ‘Fixed rate 30-year mortgage’, ‘Fixed rate 15-year mortgage’ and ‘5-1 Hybrid Adjustable rate mortgage’. One thing needs to notice that 30-year fixed-rate begin in 1971, while the 15-year data went back in 1991 and the adjustable 5-1 Hybrid started in 2005. The whole data structure also presents as follows. We only use fixed-30-yr this variable to do the analysis.

Variable Class Description
date date Date
fixed_rate_30_yr double Fixed rate 30 year mortgage (percent)
fees_and_pts_30_yr double Fees and percentage points of the loan amount
fixed_rate_15_yr double Fixed rate 15 year mortgage (percent)
fees_and_pts_15_yr double Fees and percentage points of the loan amount
adjustable_rate_5_1_hybrid double 5-1 Hybrid Adjustable rate mortgage (5 year fixed, then annual adjustable rate)
fees_and_pts_5_1_hybrid double Fees and percentage points of the loan amount
adjustable_margin_5_1_hybrid double A fixed amount added to the underlying index to establish the fully indexed rate for an ARM.
spread_30_yr_fixed_and_5_1_adjustable double Difference in rate between 30 year fixed and 5-1 adjustable

2.2.3 State abbreviation data

This data set contains a character vector of 2-letter abbreviations for the state names in the USA (R Core Team 2020a).

2.2.4 GDP data

This data set focus on the Gross Domestic Product(GDP) in the whole United State and each state. GDP measures of national income and output for each state economy. We will use GDP data combine with USA map to do the following analysis.

2.2.5 Recessions data

The final data set concludes all the recession dates and information from Wikipedia in the U.S. These periods of recession have a significant impact on the U.S. economy. These events may affect many industries and markets and are an essential factor in analyzing house prices over time. Recession can slow down the market, increase unemployment which leads to loss of income and falling wages which ultimately reduces the spending power of potential home buyers. There have been 14 noteworthy recessions throughout U.S. history, including the Great Depression. The brief structure of this data will show as follows. The first five variables will be applied in the following paragraphs.

Variable Class Description
name character Recession Name
period_range character Time period range of the recession
duration_months character How long the recession lasted
time_since_previous_recession_months character Time since previous recession in months
peak_unemploy_ment character Peak unemployment (percent)
gdp_decline_peak_to_trough character GDP decline from peak to trough
characteristics character Paragraph description of the recession

2.3 Limitations of data

There are some limitations regarding the above data sets.

As a first step, we examined the missing value of the mortgage rate data. From Figure 2.1, we found that the mortgage rates for fixed 15 and adjustable 5-1 Hybrids were completely missing, so we only can use the mortgage rates for fixed-30-year at the end.

Missing value for the mortgage rates, only available for fixed 30 years don't have NA

Figure 2.1: Missing value for the mortgage rates, only available for fixed 30 years don’t have NA

Furthermore, the HPI and the mortgage data set do not include data for 2019 and 2020. It has a particular influence on the following analysis of this report, especially the inference of the real estate bubble, will have a bias.

3 Exploration Data Analysis

In this section, we will start with six research questions to explore data and tell you what economic secrets the HPI brings to us.

3.1 How similar is the change in state HPI with national HPI?

As the national HPI drive the states, it is necessary to look at which states are the driving force behind it and which states are struggling. However, to analyse the HPI values across 51 states is quite an arduous approach. In the end, we decided to look at the HPI for regions and compare it with national HPI. We use a built-in data set called ‘state.region’ to divided each state into four regions.

Figure 3.1 presents four regions across the U.S. and how the HPI values at the state level compared with the national level. It is obvious to see that the West and Northeast regions are relatively on the same path as national HPI. In some cases, these two regions are slightly higher than at the national level. The countries represented by these regions are California in West and New York or Connecticut in the Northeast. California continues to lead the tech industries in recent years (CIO 2016). For Regions like South and North Central which are comparatively lower than national changes. It may be due to the location and less population. We noticed that they also witnessed a gradual increase in HPI, but the effect of the housing bubble and the great recession was less drastic.

Figure 3.1: Compare US Four Regions with the Natinal Level

3.2 Are HPI more likely to be higher in prosperous states?

This part aims to answer the question: Are HPI more likely to be higher in prosperous or populous cities? Since the time interval of the original data is exceptionally long, calculate the average HPI from 1975 to 2018 has some limitations. Therefore, selecting the data for the latest ten years could answer this question more accurately. The following Figure 3.2 used the shade of colour to represent the HPI. The dark red area indicates higher HPI and vice verse. Within the whole area, the darkest state is ND (State of North Dakota) which reached 180% of HPI. Although we could not conclude HPI may be higher in the prosperous region, the HPI in the western and eastern coastal area is higher than inland area. On the map, the littoral states of the U.S. are darker than inland areas, especially near California and the capital region.

The map of House Price Index(HPI) in each state

Figure 3.2: The map of House Price Index(HPI) in each state

For a more comprehensive analysis, GDP data is more convincing. In other words, prosperous states could have higher GDP on average. Same as HPI, calculate the average GDP from 1997 to 2019 has some limitations, we choose ten years same as HPI will be more accurate. In Figure 3.3, the most prosperous state is California, but its HPI was not the highest one. In contrast, the State of North Dakota(ND) has reached the highest HPI in the latest decade, but GDP is relatively lower in these states. Therefore, we could not conclude that higher HPI is more likely to be higher in the prosperous state. However, from Figure 3.2 and Figure 3.3, we could infer that the GDP and HPI of the coastal area are relatively higher than other inland regions.

Gross Domestic Product(GDP) in each state

Figure 3.3: Gross Domestic Product(GDP) in each state

3.3 Was the annual change in HPI index similar to the annual change in mortgage rates?

Are the HPI and the mortgage rate rising or falling? Are their trends consistent? That would be a good start to explore these two economic indices.

Therefore, we calculated the annual change for HPI and the mortgage rates from 1975 to 2018. From Figure 3.4, we find some interesting phenomena is that the annual changes for the mortgage rates had been negative for years, and only a few years had been positive. On the contrary, HPI continued to decline from 2007 to 2011, while the annual change ratios rose in the rest years. That is, the annual trends of the two seem inconsistent, except that they were both negative from 2007 to 2011, and the U.S. was in Great Recession at that time (Table 3.1).

Mortgage annual changes vs. HPI annual changes

Figure 3.4: Mortgage annual changes vs. HPI annual changes

3.4 Do lower mortgage rates mean higher HPI? Can the long mortgage rates predict HPI?

Mortgage rates and HPI are two separate entities which are calculated on different parameters, but their relations can give interesting insights. For our analysis, we focus on the ‘fixed 30 years rate’ mortgage, as this is more popular within US home-buyers’ markets and has offered a completed picture since the mortgage rate begins.

In the short term, the factors affecting mortgage rate and house price are different. Mortgage rates can indirectly affect home prices. To be specific, the mortgage rate trend to decrease when the economy is low, the market is relatively unhealthy, and wages are declining. A significant factor causing home prices to rise is a shortage of entry-level construction (Bankrate (2018)). Since building material is increasing in cost, constrictor more willing to invest the building cost in high-end properties, intense competition for entry-level homes will lead to higher prices.

In Figure 3.5, we can notice that the House Price Index increased overtime, surprisingly mortgage rates have come crashing down from 1971 to 2018. It indicates that there is a strong relationship that higher rates will lead to a drop-in home price in the long term. This situation follows a simple economics situation, as the economy improves, people will have more money to buy houses. As people buy homes using mortgages most of the time, this means that financing houses have become more accessible, which indicates that the bank has enough reserves to dish out mortgage loans at such low rates. Then, house demand will become higher, which will eventually lead to housing prices increase. Therefore, in the long term, we could use the changes in mortgage rates to predict HPI. Figure 3.6 also could indicate that the correlation is -0.792, which means the mortgage rates and HPI have strong negative relationships.

The changes between Mortgage rates and HPI

Figure 3.5: The changes between Mortgage rates and HPI

The relationship between HPI and Mortgage rate

Figure 3.6: The relationship between HPI and Mortgage rate

3.5 Can HPI and mortgage rates provide evidence of a subprime crisis?

Table 3.1: Recessions and crises in U.S.
name start end peak unemployment rate
Great Depression 1929-08-01 1933-03-01 21.3%(1932)[50]– 24.9%(1933)[51]
Recession of 1937–1938 1937-05-01 1938-06-01 17.8%[50]– 19.0%(1938)[57]
Recession of 1945 1945-02-01 1945-10-01 5.2%57
Recession of 1949 1948-11-01 1949-10-01 7.9%(Oct 1949)
Recession of 1953 1953-07-01 1954-05-01 6.1%(Sep 1954)
Recession of 1958 1957-08-01 1958-04-01 7.5%(July 1958)
Recession of 1960–61 1960-04-01 1961-02-01 7.1%(May 1961)
Recession of 1969–70 1969-12-01 1970-11-01 6.1%(Dec 1970)
1973–75 recession 1973-11-01 1975-03-01 9.0%(May 1975)
1980 recession 1980-01-01 1980-07-01 7.8%(July 1980)
1981–1982 recession 1981-07-01 1982-11-01 10.8%(Nov 1982)
Early 1990s recession 1990-07-01 1991-03-01 7.8%(June 1992)
Early 2000s recession 2001-03-01 2001-11-01 6.3%(June 2003)
Great Recession 2007-12-01 2009-06-01 10.0%(October 2009)[76]
COVID-19 recession 2020-02-01 NA 14.7%(April 2020)

Changes in HPI and mortgage rates are closely related to economic activity. However, is there a correlation with recessions? Analyzing the effect of the recession on HPI and mortgage rates using financial data is outside this report’s scope. However, the recession data scripted from Wikipedia can give us insights to look at the behaviour of HPI and mortgage rates during the period of recession.

The trend for Recession, National HPI and Mortgage rates. The red line represents the mortgage rates, the green line represents HPI, and the shaded blue area represents the duration of the recession.

(#fig:nation_recession)The trend for Recession, National HPI and Mortgage rates. The red line represents the mortgage rates, the green line represents HPI, and the shaded blue area represents the duration of the recession.

Figure @ref(fig:nation_recession) shows the trends for the U.S. average HPI and mortgage rates and the period time of recession across time in the U.S. as well. From Figure @ref(fig:nation_recession), we can find that the HPI kept rising from 1975 to 2006, even as the American economy has experienced many recessions in this period, which may seem strange. It indicates that with the development of the economy, urban expansion, and population increase, people’s demand for housing is increasing (Kulish, Richards, and Gillitzer 2012). However, the HPI had a significant drop between 2007 and 2009 during the Great Recession. The sub-prime mortgage crisis led to a sustained economic depression, with the unemployment rate reaching 10%, leading to a sharp reduction in demand for houses(Lee and Painter 2013).

When we turn our attention to the mortgage rates, we can find some different trends from the HPI. Interestingly, the mortgage rates experienced drastic fluctuations during the 1980 Recession and the 1981-1982 Recession, indicating an association between them. Also, the mortgage rates rose slightly and then decreased during the Great Recession from December 2007 to June 2009. From the above analysis, it can be seen that the mortgage rates tend to go through a process of rising and then falling during the economic recession, which was because the government used monetary policy to adjust interest rates and promote economic recovery(Azis 2010).

3.6 Can we find any house price bubble in the U.S from 1975 to 2018?

As Kindleberger (1987) states, a bubble is a surge in asset prices, which is expected to be continuous. However, the booming trend always turns over afterward and causes a sharp decline, which may spark a financial crisis. It’s interesting to explore whether there was a bubble in the U.S. housing market from 1975 to 2018. We will provide you some insights from historical trends in HPI.

Figure 3.7 shows that the U.S. experienced the housing bubble1 from 2000 to 2012. House prices continued to rise from 2001 and peaked in 2006, and then decline for six years. The housing bubble is the result of multiple factors, including low mortgage rates(Aggarwal 2012). As shown in Figure @ref(fig:nation_recession), the overall mortgage rate kept a downward trend since 2000 and remained low, which further confirms that low mortgage rates were one of the main factors leading to this housing bubble1 and sub-prime mortgage crisis.

However, HPI experienced a continuous increase process again from 2012 to 2018. Also, the mortgage rates kept decreasing further since 2010 and remained below 5. Will the housing bubble2 emerge? Does the blue line in Figure 3.7 predict the future scenario? We think the housing bubble2 is highly likely to happen.

We are experiencing the COVID-19 depression in 2020, the peak unemployment rate in the U.S. reached 14.7% in April 2020. There is no doubt that high unemployment brings lower expected incomes and reduces the demand for housing in the bust(Krivenko and others 2018), which implies the decline for future HPI and the occurrence of the housing bubble2.

Recession and National HPI. The occurrence of the US housing bubble from 1975 to 2018 and the scenarios that may present in the future.

Figure 3.7: Recession and National HPI. The occurrence of the US housing bubble from 1975 to 2018 and the scenarios that may present in the future.

4 Limitations of anaysis

This section mainly introduces two main limitations of this analysis.
- Only fixed-30-year mortgage rate can be used in this analysis, so we cannot analyze the relationship between fixed 15-year rate and adjustable-5-year rate. Therefore, the correlation between HPI and mortgage rate is not convincing in this analysis, and we need more data to confirm it.
- The analysis of the housing bubble is extrapolated based on trends in historical HPI data and recession data. We cannot guarantee the accuracy of the prediction of bubble 2 and future scenario but make reasonable inferences based on the data, and there will cause deviation in results.

5 Conclusion

The research explores the economy secrets between HPI, mortgage rates, GDP and the recession. From the above analysis, we find that the HPI of the West and Northeast of the United States is slightly higher than that of the national HPI, but interestingly, the states with high GDP in the West and Northeast are not consistent with the states with the high HPI. Moreover, we find that there is no apparent relationship between the mortgage rates and HPI in the short term, but there is a specific negative correlation in a long time. Interestingly, the HPI fluctuations were not affected by the early recession. However, the 2007-2009 Great Recession led to a sharp decline in the HPI, which also led to the first housing bubble in the U.S. housing market. Thus, from the following HPI trends and the COVID-19 recession in 2020, we inferred that the U.S. housing market is currently experiencing a second housing bubble.

For future research, it will be exciting to keep digging deeper into the housing market and research more data to find out if the U.S. housing market is experiencing a second housing bubble.

Acknowledgments

The authors would like to thank tidytuesday organization provides data for us. And also thank all the contributors to the following R package: Wickham et al. (2019), Wickham (2016), Wickham, Hester, and Francois (2018), Müller (2017), R Core Team (2020b), Wickham et al. (2020), Grolemund and Wickham (2011), Schloerke et al. (2020), Firke (2020), Venables and Ripley (2002), Zhu (2019), Tierney et al. (2020), Pedersen (2020), Sievert (2020), Wickham and Seidel (2020), Xie (2020).

References

Aggarwal, Vijita. 2012. “The Causes and the Effects of the ‘Housing Bubble, and the ‘Real Estate Crisis’.” Research Journal of Social Science and Management 2 (January).

Azis, Iwan J. 2010. “Predicting a Recovery Date from the Economic Crisis of 2008.” Socio-Economic Planning Sciences 44 (3): 122–29.

Bankrate. 2018. “Do Rising Mortgage Rates Trigger Lower House Prices?” https://www.bankrate.com/finance/mortgages/rising-rates-lower-house-prices.aspx.

CIO. 2016. “The State of the Industry: The Biggest Regional Tech Trends of 2016.” https://www.cio.com/article/3190271/the-state-of-the-industry-the-biggest-regional-tech-trends-of-2016.html.

Data, Regional. 2019. “GDP and Personal Income.” US Bureau of Economic Analysis. Available Online: Https://Apps. Bea. Gov/Itable/iTable. Cfm.

Firke, Sam. 2020. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

Index, House Price. 2018. “Federal Housing Finance Agency.” Retrieved from.

Kindleberger, Charles. 1987. “Bubbles, the New Palgrave: A Dictionary of Economics, John Eatwell, Murray Milgate, and Peter Newman, Eds.” New York: Stockton Press.

Krivenko, Pavel, and others. 2018. “Unemployment and the Us Housing Market During the Great Recession.” In 2018 Meeting Papers. Vol. 579. Society for Economic Dynamics.

Kulish, Mariano, Anthony Richards, and Christian Gillitzer. 2012. “Urban Structure and Housing Prices: Some Evidence from Australian Cities.” Economic Record 88 (282): 303–22.

Lee, Kwan Ok, and Gary Painter. 2013. “What Happens to Household Formation in a Recession?” Journal of Urban Economics 76: 93–109.

Müller, Kirill. 2017. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.

NAHB. 2019. “National Association of Home Builders.”

Pedersen, Thomas Lin. 2020. Patchwork: The Composer of Plots. https://CRAN.R-project.org/package=patchwork.

R Core Team. 2020a. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

———. 2020b. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Schloerke, Barret, Di Cook, Joseph Larmarange, Francois Briatte, Moritz Marbach, Edwin Thoen, Amos Elberg, and Jason Crowley. 2020. GGally: Extension to ’Ggplot2’. https://CRAN.R-project.org/package=GGally.

Sievert, Carson. 2020. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.

tidytuesday. 2019. “Tidytuesday-House and Mortgage Data.” https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-02-05.

Tierney, Nicholas, Di Cook, Miles McBain, and Colin Fay. 2020. Naniar: Data Structures, Summaries, and Visualisations for Missing Data. https://CRAN.R-project.org/package=naniar.

Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth. New York: Springer. http://www.stats.ox.ac.uk/pub/MASS4/.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

Wickham, Hadley, and Dana Seidel. 2020. Scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.

Wikipedia. 2020. “List of Recessions in the United States.” https://en.wikipedia.org/wiki/List_of_recessions_in_the_United_States.

Xie, Yihui. 2020. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.

Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.